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Crowding, the identification difficulty for a target in the presence of nearby flankers, is ubiquitous in spatial vision and is considered a bottleneck 
of object recognition and visual awareness. Despite its significance, the neural mechanisms of crowding are still unclear. Here, we performed 
event-related potential and fMRI experiments to measure the cortical interaction between the target and flankers in human subjects. We found 
that the magnitude of the crowding effect was closely associated with an early suppressive cortical interaction. The cortical suppression was 
reflected in the earliest event-related potential component (C1), which originated in V1, and in the BOLD signal in V1, but not other higher 
cortical areas. Intriguingly, spatial attention played a critical role in the manifestation of the suppression. These findings provide direct and 


converging evidence that attention-dependent V1 suppression contributes to crowding at a very early stage of visual processing. 
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Introduction 

When a target is presented with nearby flankers in the peripheral 
visual field, it becomes harder to identify, which is referred to as 
crowding. Crowding is a form of inhibitory interaction that is ubiq- 
uitous in spatial vision, and it has been reported to occur with vari- 
ous kinds of stimuli and tasks (Levi, 2008; Whitney and Levi, 2011). 
Studying crowding can advance our understanding of conscious vi- 
sion and object recognition throughout the visual field. 

Despite the significance of crowding, its mechanisms are still 
unclear. Based on psychophysical findings, various theories have 
been proposed to explain crowding at multiple levels. Some the- 
ories attribute crowding to early visual cortical interaction. They 
propose that crowding occurs when the target and flanker over- 
lap within the same neural unit (Flom et al., 1963; Levi et al., 1985; 
Pelli, 2008) or are represented by different populations of neu- 
rons with long-range horizontal connections (Levi, 2008). These 
theories suggest that crowding influences the representation of 
the target in early visual processing stages. On the other hand, 
attention theories argue that crowding could be ascribed to 
coarse resolution of spatial attention (He et al., 1996) or unfo- 
cussed spatial attention (Strasburger, 2005). The effect of crowd- 
ing on the target representation is in late processing stages. 
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To date, very few neurophysiological studies have attempted 
to investigate the neural mechanisms of crowding (Fang and He, 
2008; Bi et al., 2009; Freeman et al., 2011; Anderson et al., 2012; 
Millin et al., 2013). A major obstacle is the difficulty in isolating 
neural signals induced by the target from those by flankers. This is 
because cortical areas responding to the peripheral target and 
flankers are hard to separate, especially with current brain imag- 
ing techniques. Several fMRI studies (Freeman et al., 2011; An- 
derson et al., 2012; Millin et al., 2013) showed that crowding 
attenuated BOLD signals in early visual cortex, as early as in V1. 
However, because of the low temporal resolution of fMRI, it is 
unclear whether the attenuation originates in V1 or reflects top- 
down feedback from higher cortical areas. Moreover, no existing 
literature, except a conference presentation by Tjan et al. (2012) 
has investigated an important diagnostic criterion for crowding, 
the radial-tangential anisotropy. 

We performed event-related potential (ERP) and fMRI exper- 
iments to address these issues. In these experiments, we circum- 
vented the isolation difficulty with novel experimental designs. In 
the ERP experiments, we examined whether the inhibitory inter- 
action (or cortical suppression) between the target and flankers 
could be reflected in the C1 component. This would clarify the 
bottom-up versus top-down issue because C1 is the earliest ERP 
component and is thought to be generated mainly by feedforward 
neuronal responses in V1 (Clark and Hillyard, 1996). The fMRI 
experiments were designed to complement and corroborate the 
ERP experiments. We examined how the cortical suppression 
was reflected in BOLD signals in V1-V4, lateral occipital area 
(LO), and intraparietal sulcus (IPS). To explore how attention 
contributes to crowding, we also compared the conditions when 
subjects paid or did not pay attention to the stimuli in all these 
experiments. 
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Materials and Methods 


Subjects. There were 20 subjects (14 male) in Experiment 1, 20 (12 male) 
in Experiment 2, 10 (4 male) in Experiment 3, and 10 (6 male) in Exper- 
iment 4. All subjects were right-handed and reported normal or 
corrected-to-normal vision. Ages ranged from 18 to 27 years. They gave 
written, informed consent in accordance with the procedures and pro- 
tocols approved by the human subjects review committee of Peking 
University. 

Stimuli. All the targets and flankers were circular sinusoidal gratings 
(diameter: 2.36°; spatial frequency: 2.54 cycles/°; Michelson contrast: 1; 
mean luminance: 61.47 cd/m). The background luminance was also 
61.47 cd/m”. In all experiments, the target was centered at 8° eccentricity 
in the upper left visual quadrant. We presented the stimuli in the upper 
visual field, rather than the lower visual field. This is because: (1) crowd- 
ing is stronger in the upper visual field than in the lower visual field (He 
et al., 1996); and (2) it is easier to separate C1 and the following positive 
P1 component with upper visual field stimuli, as the C1 induced by 
stimuli in the upper visual field has a negative polarity, whereas the C1 
induced by stimuli in the lower visual field has a positive polarity (Clark 
et al., 1994). The orientation of the target was 45 + 0°, either left or right 
tilted. 6 was predetermined by a psychophysical test (see below). The 
orientations of the flankers were independently and randomly selected 
from 0° to 180° for each trial. Subjects were asked to maintain fixation on 
a black dot at the center of the display throughout the experiments. 

In Experiment 1, there were five stimuli: target only (T), target with 
nearby flankers (Near_T+F), target with far flankers (Far_T+F), nearby 
flankers only (Near_F), and far flankers only (Far_F) (see Fig. 1A). The 
flankers were positioned in the radial direction with respect to fixation. 
The center-to-center distance between the flankers and the target was 
2.48° in the Near_T+F stimulus and 5.07° in the Far_T+F stimulus. 
Experiment 2 also had five stimuli: target only (T), target with flankers 
positioned radially (Rad_T+F), target with flankers positioned tangen- 
tially (Tan_T+F), radial flankers only (Rad_F), and tangential flankers 
only (Tan_F) (see Fig. 1B). In both the Rad_T+F and the Tan_T+F 
stimuli, the center-to-center distance between the flankers and the target 
was 2.36°. In Experiments 3 and 4, the stimuli were identical to those in 
Experiments 1 and 2, respectively, except T was not used. 

8 was the orientation discrimination threshold (75% correct) for the 
target in the Far_T+F stimulus (Experiments 1 and 3) and the Tan_T+F 
stimulus (Experiments 2 and 4). To measure the threshold, a stimulus 
(Far_T+F or Tan_T+F) was presented for 250 ms. The orientation of 
the target was either 45+ 6° or 45-6°. Subjects were asked to judge the 
orientation of the target relative to 45° (clockwise or counterclockwise). 
The 0 varied trial by trial and was controlled by the QUEST staircase 
(Watson and Pelli, 1983). 

ERP experiments. The procedures of Experiments 1 and 2 were identi- 
cal, except that different stimuli were used. Visual stimuli were displayed 
on a ViewSonic color graphic monitor (refresh rate: 60 Hz; resolution: 
1024 X 768; size: 22 inches) with a gray background at a viewing distance 
of 73 cm. A chin rest was used to stabilize subjects’ head position. 

Each trial began with one of the five stimuli (T, Near_T+F, Far_T+F, 
Near_F, and Far_F in Experiment 1 and T, Rad_T+F, Tan_T+F, Rad_F, 
and Tan_F in Experiment 2) presented in the upper-left visual quadrant 
for 250 ms. Then, after a 450—650 ms blank interval, a grating whose 
orientation slightly deviated from the vertical was presented for 100 ms in 
the lower-right visual quadrant. Two low-contrast dashed circles, one at 
the same location as the target in the first stimulus and the other at the 
same location as the grating in the second stimulus, were always pre- 
sented on the screen to indicate the positions of the target and the second 
grating, respectively (see Fig. 1C). 

Both Experiments 1 and 2 consisted of two sessions: the attended 
session and the unattended session. In these two sessions, subjects viewed 
the same stimuli but performed different tasks. In the attended session, 
subjects were instructed to pay attention to the upper left visual quad- 
rant, respond to the first stimulus, and ignore the second stimulus. If the 
stimulus contained a target, subjects needed to press one of two buttons 
to indicate the orientation of the target relative to 45° (clockwise or 
counterclockwise). If the stimulus contained only the flankers, subjects 
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pressed a button randomly. In the unattended session, subjects were 
instructed to pay attention to the lower right visual quadrant, ignore the 
first stimulus, and respond to the orientation of the second stimulus 
relative to the vertical (left or right). The two sessions were performed on 
different days and were counterbalanced across subjects. In each session, 
there were 20 blocks of 100 trials, with 20 trials for each of the five stimuli. 

Scalp EEG was recorded from 64 Ag/AgCl electrodes positioned accord- 
ing to the extended international 10-20 EEG system. Vertical electro- 
oculogram was recorded from an electrode placed above the right eye. 
Horizontal EOG was recorded from an electrode placed at the outer canthus 
of the left eye. Electrode impedance was kept <5 kQ. EEG was amplified with 
a gain of 500 K, bandpass filtered at 0.05—100 Hz, and digitized at a sampling 
rate of 1000 Hz. The signals on these electrodes were referenced online to the 
nose and were rereferenced offline to the average of the two mastoids. 

Offline data analysis focused on the EEG signals induced by the first 
stimulus, using Brain Vision Analyzer (Brain Products). EEG data were 
first low-pass filtered at 30 Hz and then epoched starting at 100 ms before 
stimulus onset and ending at 200 ms after stimulus onset. Each epoch was 
corrected for baseline over the 100 ms prestimulus interval. The epochs 
contaminated by eye blinks, eye movements, or muscle potentials ex- 
ceeding +50 uV at any electrode were excluded from analysis. Remain- 
ing epochs were selectively averaged according to the stimulus 
conditions. To select electrodes for the C1 amplitude and latency analy- 
sis, grand averaged ERPs were made by averaging across subjects and 
stimulus conditions but separately for the two sessions. Five electrodes 
with the largest C1 amplitudes were chosen for further analysis. To quan- 
tify the C1 amplitude and latency for each stimulus and each subject, the 
waveforms at these five electrodes were first averaged to obtain a mean 
waveform. The mean amplitude of the 11 sampling points around the C1 
peak of the mean waveform was defined as the C1 amplitude. The Cl 
latency was the peak latency of the mean waveform. 

Estimation of the dipole sources was performed using the BESA algo- 
rithm, as described by Clark et al. (1994). The C1 component was mod- 
eled on the grand-averaged waveforms elicited by all five stimuli. The 
waveform in the 3 ms interval around the peak amplitude (between 76 
and 78 ms in Experiment 1, 77 and 79 ms in Experiment 2) was simulated 
with one dipole with free location and orientation. 

fMRI experiments. Experiment 3 used an event-related design and had 
two sessions: the attended session and the unattended session. Each ses- 
sion consisted of eight functional runs of 128 continuous trials (2 s for 
each trial). In these two sessions, subjects viewed the same stimuli but 
performed different tasks. In the attended session, each run began with a 
12 s fixation period and ended with a 14 s fixation period, thus lasting 
282 s. The order of the three types of trials (blank, far, and nearby) in each 
run was balanced using M-sequence (Buracas and Boynton, 2002). Spe- 
cifically, a four condition M-sequence was adopted, with one condition 
for far trials, one condition for nearby trials, and two conditions for blank 
trials, such that subjects would not feel time-pressed to perform the task. 
For each of the far and nearby conditions, there were 32 trials in each run 
and 256 trials (32 X 8) in total. In a far trial, the Far_T +F and Near_F 
stimuli were presented successively in a random order, each for 0.25 s. In 
a nearby trial, the Near_T+F and Far_F stimuli were presented in the 
same way (see Fig. 4A). In the following 1.5 s, subjects performed the 
same orientation discrimination task with the target as that in the ERP 
experiments. Ina blank trial, only the fixation point was presented for 2 s. 
In the unattended session, subjects were asked to ignore the stimuli and 
detect a brief luminance change at the fixation point. A dashed circle at 
the location of the target was always presented on the screen to indicate 
the position of the target. The procedure of Experiment 4 was identical to 
that of Experiment 3, but different stimuli (Rad_T +F, Tan_T+F, Rad_F, 
and Tan_F) were used. In a radial trial, the Rad_T+F and Tan_F stimuli 
were presented. In a tangential trial, the Tan_T+F and Rad_F stimuli 
were presented (see Fig. 6A). 

Retinotopic visual areas (V1, V2, V3, and V4) were defined by a 
standard phase-encoded method developed by Sereno et al. (1995) 
and Engel et al. (1997), in which subjects viewed rotating wedge and 
expanding ring stimuli that created traveling waves of neural activity 
in visual cortex. For both Experiments 3 and 4, a block-design run was 
used to localize the ROIs in V1-V4, LO cortex LO, and IPS, corre- 
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Figure 1. 


Stimuli, design, and psychophysical results for Experiments 1 and 2. A, Stimuli in Experiment 1. T, Target; F, flanker. “Near” and “Far” indicate the distance between the target and the 


flankers. Black dot represents the fixation point. The stimuli were presented in the upper left visual quadrant. B, Stimuli in Experiment 2. Tan, Tangential; Rad, radial. C, Protocol of Experiments 1 and 
2. Subjects performed an orientation discrimination task either with the target grating in the first stimulus in the attended session or with the grating in the second stimulus in the unattended 
session. D, Psychophysical result for Experiment 1. E, Psychophysical result for Experiment 2. **p < 0.01, statistically significant difference between stimulus conditions. ***p < 0.001, statistically 


significant difference between stimulus conditions. Error bars indicate 1 SEM across subjects. 


sponding to the area covered by the four flankers and the target. The run 
consisted of 12 12-s stimulus blocks, interleaved with 12 12-s blank in- 
tervals. In a given stimulus block, subjects passively viewed images of 
colorful natural scenes, which had the same shape, size, and location as 
the target and flankers (see Figs. 4C and 6C). The images appeared at a 
rate of 8 Hz. 

MRI data were collected using a 3T Siemens Trio scanner with a 12- 
channel phase-array coil. In the scanner, the stimuli were back-projected 
via a video projector (refresh rate: 60 Hz; spatial resolution: 1024 X 768) 
onto a translucent screen placed inside the scanner bore. Subjects viewed 
the stimuli through a mirror located above their eyes. The viewing dis- 
tance was 83 cm. BOLD signals were measured with an echo-planar 
imaging sequence (TE: 30 ms; TR: 2 s; FOV: 192 X 192 mm 2. matrix: 
64 X 64; flip angle: 90; slice thickness: 3 mm; gap: 0 mm; number of slices: 
33, slice orientation: axial). The fMRI slices covered the occipital lobe, 
most of the parietal lobe, and part of the temporal lobe. A high-resolution 
3D structural dataset (3D MPRAGE; 1 X 1 X 1 mm? resolution) was 
collected in the same session before the functional runs. For both Exper- 
iments 3 and 4, subjects underwent three sessions: the retinotopic map- 
ping session, the attended session, and the unattended session. 

The anatomical volume for each subject in the retinotopic mapping 
session was transformed into the AC-PC space and then inflated using 
BrainVoyager QX (Brain Innovation). Functional volumes in all sessions 
for each subject were preprocessed, including 3D motion correction, 
linear trend removal, and high-pass (0.015 Hz) filtering using BrainVoy- 
ager QX. The images were then aligned to the anatomical volume in the 
retinotopic mapping session. A GLM procedure was used for selecting 
ROIs. The ROIs in V1-V4, LO, and IPS were defined as areas that re- 
sponded more strongly to the natural scene images than to a blank screen 
(p< 10 -4 uncorrected). 

Event-related BOLD signals were calculated separately for each sub- 
ject, following the method used by Kourtzi and Kanwisher (2000). For 
each event-related run, the time course of the MR signal intensity was 
first extracted by averaging the data from all the voxels within the pre- 
defined ROI. The average event-related time course was then calculated 
for each type of trial. Specifically, in each run, we averaged the signal 


intensity across the trials for each trial type at each of 9 corresponding 
time points (volumes) starting from the stimulus onset. These event- 
related time courses of the signal intensities were then converted to time 
courses of percentage signal change for each type of trial by subtracting 
the corresponding value for the blank trials and then being divided by 
that value. The resulting time course for each type of trial was then 
averaged across runs for each subject and then across subjects. In the 
psychophysical, ERP, and fMRI data analyses, Bonferroni correction was 
applied with t tests involving multiple comparisons. 


Results 

Experiment 1: C1 suppression and the target—flanker distance 
It is well known that the crowding zone extends to approximately 
half the target eccentricity (Bouma, 1970, 1973). That is, crowd- 
ing is significantly stronger when the target is presented with 
nearby flankers than with far flankers. If the cortical suppression 
between the target and flankers contributes to crowding, we pre- 
dict a stronger suppression in the nearby condition relative to the 
far condition. We conducted the first ERP experiment to test this. 

Five stimuli (Fig. 1A) were used, including target only (T), 
target with nearby flankers (Near_T+F), target with far flankers 
(Far_T+F), nearby flankers only (Near_F), and far flankers only 
(Far_F). The target was centered at 8° eccentricity in the upper 
left visual quadrant, and its orientation was ~45°. The orienta- 
tions of the flankers were randomly selected for each trial. In a 
given trial, one of the five stimuli was presented for 250 ms. Then, 
after a 450—650 ms blank interval, a grating was presented for 100 
ms in the lower-right visual quadrant. The orientation of the 
grating slightly deviated from the vertical (Fig. 1C). 

Experiment 1 consisted of two sessions: the attended session and 
the unattended session. In these two sessions, subjects viewed the 
same stimuli but performed different tasks. In the attended session, 
subjects always paid attention to the upper left visual quadrant and 
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ERP results for the attended and unattended sessions in Experiment 1. A, C1 topographies in response to the first stimulus averaged over all five stimulus conditions and all subjects. 


Posterior electrodes, including P1, P3, P03, PO7, and 01 (within the black ellipse), had the largest C1 amplitudes. B, Locations of a single dipole that best accounted for the variance in the C1 scalp 
voltage distribution. C, ERPs averaged over the five electrodes and all subjects for each stimulus condition. D, Suppression indices when the target was presented with the nearby or far flankers. E, 
Suppression indices obtained from half of the EEG data in the attended session exhibited a strong behavioral crowding effect (nearby vs far), and the other half exhibited a weak effect. *p < 0.05, 
statistically significant difference between stimulus conditions. **p < 0.01, statistically significant difference between stimulus conditions. Error bars indicate 1 SEM across subjects. 


responded to the first stimulus. If the stimulus contained a target, 
subjects pressed one of two buttons to indicate the orientation of the 
target relative to 45° (clockwise or counterclockwise). If the stimulus 
contained the flankers only, subjects pressed a button randomly. In 
the unattended session, subjects always paid attention to the lower 
right visual quadrant and judged the orientation of the second stim- 
ulus relative to the vertical (left or right). 

In the attended session, subjects’ response accuracies with the 
T, Far_T+F, and Near_T+F stimuli were 81%, 79%, and 69%, 
respectively (Fig. 1D). Performance differences were significant 
between the Near_T+F and T stimuli (tao) = 7.32, p < 0.001) 
and between the Near_T+F and Far_T+F stimuli (tqo) = 7.04, 
p < 0.001), demonstrating that the presentation of the nearby 
flankers led to evident crowding and that the crowding effect was 
modulated by the target—flanker distance. 

We focused ERP data analysis on the C1 component evoked 
by the first stimulus because C1 is the earliest ERP component 
(onset latency 50—55 ms) and is known to reflect the feedforward 
response of neurons in V1 (Clark and Hillyard, 1996; Martinez et 
al., 1999; Pourtois et al., 2004; Bao et al., 2010). Figure 2A shows 
its topographies averaged over all the five stimuli in the attended 
and unattended sessions. Dipole modeling confirmed that the 
intracranial source of the C1 component was located in V1 (Fig. 
2B; Table 1). Consistent with previous studies (Clark et al., 1994; 
Bao et al., 2010), left posterior electrodes, including P1, P3, PO3, 
PO7, and O1, had the largest C1 amplitudes. Figure 2C shows the 
averaged waveforms across the five electrodes. The C1 compo- 
nent was visible between 50 and 90 ms after stimulus onset and 
had a peak latency of ~77 ms. Statistical analyses were based on 
the mean C1 amplitudes and latencies across these five electrodes. 

The target and flankers were represented by three relatively 
separate neuronal populations in the lower bank of the right 


Table 1. Information on a single dipole that best accounted for the variance in the 
C1 scalp voltage distribution 


Talairach Percentage of 
Session coordinates variance accounted 
Experiment 1 Attended 10.0, —83.0, 1.0 92.3 
Unattended 13.8, —90.1, 2.7 90.1 
Experiment 2 Attended 14.0, —81.0, 11.0 94.7 
Unattended 15.0, —81.0, 13.0 93.0 


calcarine sulcus where V1 is located. The electrical currents from 
the populations were conducted through the brain and were 
summated on the scalp, generating the Cl component we ob- 
served here. Fu et al. (2010) observed that, when two stimuli were 
presented in the left and right visual fields, respectively (assuming 
little interaction between them), the C1 amplitude evoked by the 
simultaneous presentation of the two stimuli was equal to the 
sum of the C1 amplitudes evoked by presenting the stimuli sep- 
arately. Similar to previous neurophysiological (Moran and Desi- 
mone, 1985; Miller et al., 1993; Luck et al., 1997) and fMRI 
studies (Kastner et al., 1998), we defined a suppression index 
between the target and flankers as (C1; + Clp) — Cl+,, for the 
nearby and far conditions. Cly and Cl, were the C1 amplitudes 
evoked by the target (T) and flankers (Near_F or Far_F), respec- 
tively. Cly+p was the Cl amplitude evoked by presenting the 
target and flankers simultaneously (Near_T+F or Far_T+F). If 
there is mutual suppression between the simultaneously pre- 
sented target and flankers, the absolute value of C1, should be 
less than that of Cl; + Clp. Because the stimuli were presented in 
the upper visual field and the Cl components had a negative 
polarity, the suppression index should be <0. The more negative 
(lower) the index, the stronger the suppression. The suppression 
indices in Experiment 1 were either around ~0 or negative (Fig. 
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2D). In the attended session, the suppression index for the nearby 
condition was significantly lower than that for the far condition 
(tas) = 2.65, p < 0.05). However, there was no significant differ- 
ence between the two conditions in the unattended session (tqs) = 
0.33, p = 0.75). The suppression indices were submitted to a 
repeated-measures ANOVA with attention status (attended and un- 
attended) and distance (far and nearby) as within-subject factors. 
We found a significant interaction between attention status and dis- 
tance (Fi,49) = 4.37, p < 0.05). These findings demonstrate that, 
parallel to the behavioral crowding effect, suppression can be mod- 
ulated by target—flanker distance. Moreover, spatial attention played 
a significant role in the manifestation of this suppression. 

We further explored the link between the C1 suppression and 
the perceived crowding (rather than the physical stimuli). We 
first ranked the strength of crowding (i.e., the response accuracy 
difference between the Near_T+F and Far_T+F stimuli) in 20 
EEG blocks of the attended session for each subject. Then, these 
20 blocks were split into two groups: 10 blocks with the largest 
differences in the strong crowding group and the remaining 
blocks in the weak crowding group. For the strong and weak 
crowding groups, the mean accuracy differences were 20.1 + 
1.61% and —0.20 + 1.46%, respectively. Subjects viewed almost 
identical stimuli in the two groups (because the orientations of 
the flankers and target were randomized). The difference in the 
strength of crowding could then be attributed to the fluctuation 
of perceptual processing. Suppression indices were calculated for 
both groups. Only in the strong crowding group, the C1 suppres- 
sion was found to be modulated by the target—flanker distance 
(strong crowding group: tas) = 3.09, p < 0.01; weak crowding 
group: tqo) = 1.44, p = 0.17; Figure 2E). The suppression indices 
were submitted to a repeated-measures ANOVA with crowding 
strength (strong and weak) and distance (far and nearby) as 
within-subject factors. We found a significant interaction be- 
tween crowding strength and distance (F,,;9) = 5.88, p < 0.05). 
These results suggest a close relationship between the C1 suppres- 
sion and the perceived crowding. 

We also examined the effect of attention on C1 amplitude and 
latency. Paired t tests showed that there was no significant differ- 
ence between the attended and unattended sessions for all five 
stimuli. This result showed that, although attention could mod- 
ulate the interaction between the target and flankers, its effect on 
C1 amplitude and latency was very weak. 


Experiment 2: C1 suppression and the radial_tangential anisotropy 
The radial-tangential anisotropy, which refers to the phenomenon 
that radially positioned flankers can induce a stronger crowding ef- 
fect than tangentially positioned ones, is considered a diagnostic 
criterion of crowding (Whitney and Levi, 2011). In the second ERP 
experiment, we examined whether the C1 suppression was also re- 
lated to the radial—-tangential anisotropy. If this were the case, the C1 
suppression with radially positioned flankers should be stronger 
than that with tangentially positioned ones. This experiment also 
had five stimuli: target only (T), target with flankers positioned ra- 
dially (Rad_T+F), target with flankers positioned tangentially 
(Tan_T+F), radial flankers only (Rad_F), and tangential flankers 
only (Tan_F) (Fig. 1B). The procedure and data analysis were similar 
to those used in the first ERP experiment. 

In the attended session, subjects’ response accuracies with the 
T, Tan_T+F, and Rad_T+F stimuli were 85%, 81.9%, and 
71.2%, respectively (Fig. 1E). The performance differences be- 
tween the stimulus conditions were significant, demonstrating 
that the presentation of flankers led to crowding (Tan_T+F vs T: 
tas) = 3.72, p < 0.01; Rad_T+F vs T: tao) = 8.76, p < 0.001) and 
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that the radial-tangential anisotropy was evident (Rad_T+F vs 
Tan_T+F: tas) = 8.08, p < 0.001). 

Figure 3A shows C1 topographies averaged over all the five 
stimuli in the unattended and attended sessions. The C1 compo- 
nent had a peak latency of ~78 ms. CP1, CPZ, P1, P3, and Pz had 
the largest C1 amplitudes. Dipole modeling confirmed that the 
intracranial source of the C1 component was located in V1 (Fig. 
3B; Table 1). Computed from the C1 amplitudes shown in Figure 
3C, the suppression indices were negative (Fig. 3D). In the at- 
tended session, the suppression index for the radial condition was 
significantly lower than that for the tangential condition (tgo) = 
2.55, p < 0.05), suggesting a stronger suppression with the radial 
flankers than with the tangential flankers, which is consistent 
with our prediction. However, in the unattended session, there 
was no significant difference between the two conditions (t(15) = 
0.29, p = 0.78). The suppression indices were submitted to a 
repeated-measures ANOVA with attention status (attended and 
unattended) and orientation (radial and tangential) as within- 
subject factors. We found a significant interaction between atten- 
tion status and orientation (F(;,19) = 5.66, p < 0.05). 

Similar to Experiment 1, 20 EEG blocks were split into two 
groups: the strong radial-tangential anisotropy group and the 
weak radial-tangential anisotropy group. For the two groups, the 
mean response accuracy differences between the Rad_T+F and 
Tan_T+F stimuli (i.e., the magnitude of the radial—tangential 
anisotropy) were 20.5 + 1.32% and 0.85 + 1.35%, respectively. 
Only in the strong anisotropy group, the suppression index for 
the radial condition was significantly lower than that for the tan- 
gential group (strong group: tqs) = 2.97, p < 0.01; weak group: 
tas) = 1.38, p = 0.18; Figure 3E). The suppression indices were 
submitted to a repeated-measures ANOVA with anisotropy 
strength (strong and weak) and orientation (radial and tangen- 
tial) as within-subject factors. We found a significant interaction 
between anisotropy strength and orientation (F(, 49) = 4.49, p < 
0.05). Attention also had little effect on C1 amplitude and latency 
for all the five stimuli in this experiment. Overall, these results 
suggest that the C1 suppression closely mirrors the radial—tan- 
gential anisotropy of crowding. 


Experiment 3: cortical suppression and the target—flanker distance 
Although the C1 suppression found in the ERP experiments suggests 
an early V1 contribution to crowding, the role of intermediate and 
high cortical areas in crowding is still unclear. Parallel to the ERP 
experiments, two event-related fMRI experiments were designed to 
investigate the relationships between cortical suppression in differ- 
ent visual areas and the target—flanker distance (Experiment 3) as 
well as the radial—tangential anisotropy (Experiment 4). 

Because the target and flankers were small and were presented 
in periphery, it is difficult to use fMRI to separate their cortical 
representations and directly measure the effect of crowding on 
the representation of the target. We modified the paradigm de- 
veloped by Kastner et al. (1998) and Beck and Kastner (2005) to 
solve this problem. Experiment 3 had two trial types (condi- 
tions): far and nearby trials. In a far trial, the Far_T+F and 
Near_F stimuli were presented successively in a random order, 
each for 0.25 s. Ina nearby trial, the Near_T+F and Far_F stimuli 
were presented in the same way (Fig. 4A). Integrated over time, 
the physical stimulations in each location of the target and flank- 
ers were identical in the two conditions. However, relative to the 
far condition, subjects should experience a stronger crowding 
effect in the nearby condition because the target was presented 
with the nearby flankers. Subjects underwent two sessions: the 
attended session and the unattended session. In the attended 
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Figure 3. ERP results for the attended and unattended sessions in Experiment 2. A, C1 topographies in response to the first stimulus averaged over all five stimulus conditions and all 
subjects. Posterior electrodes, including CP1, CPZ, P1, P3, and Pz (within the black ellipse), had the largest C1 amplitudes. B, Locations of a single dipole that best accounted for the 
variance in the C1 scalp voltage distribution. C, ERPs averaged over the five electrodes and all subjects for each stimulus condition. D, Suppression indices when the target was presented 
with radial or tangential flankers. E, Suppression indices obtained from half of the EEG data in the attended session exhibited a strong behavioral crowding effect (radial vs tangential), 
and the other half exhibited a weak effect. *p < 0.05, statistically significant difference between stimulus conditions. **p < 0.01, statistically significant difference between stimulus 


conditions. Error bars indicate 1 SEM across subjects. 


session, subjects performed the same ori- 
entation discrimination task with the tar- 
get as that in the ERP experiments. As 
predicted, their performance was better in 
the far condition (72.1%) than in the 
nearby condition (61.7%) (to) = 4.13, 
p < 0.01, Fig. 4B). In the unattended ses- 
sion, subjects were asked to ignore the 
stimuli and detect a brief luminance 
change at the fixation point. 

ROIs were defined as cortical areas 
representing the locations of the target 
and flankers (Fig. 4C) in V1, V2, V3, V4, 
LO, and IPS. We analyzed BOLD signals 
in these ROIs in the nearby and far condi- 
tions (Fig. 5A). Any signal difference be- 
tween the two conditions might be largely 
the result of different levels of cortical 
suppression between the target and flank- 
ers, rather than the physical stimuli per se 
(Beck and Kastner, 2005). We defined a 
suppression index as (BOLD,,,-BOLD,..,)/ 
(BOLD»;,+BOLD.,)) Where BOLD;,, 
and BOLD ar are the peak amplitudes of 
BOLD signals in the far and nearby con- 
ditions, respectively. If the mutual sup- 
pression between the target and flankers 
in the nearby condition is stronger than 
that in the far condition, BOLD,,, should 
be larger than BOLD,,.,,. Thus, the sup- 
pression index should be above zero; the 
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Figure 4. Design and psychophysical result for Experiment 3. A, Design. The target was presented either with far flankers (top) 
or with nearby flankers (bottom). Subjects performed an orientation discrimination task with the target in the attended session or 
detected a luminance change of the fixation point in the unattended session. B, Psychophysical result for the far and nearby trials. 
C, A sample image for ROI localization. **p < 0.01, statistically significant difference between the tangential and radial trials. Error 
bars indicate 1 SEM across subjects. 
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fMRI results for Experiment 3. A, Event-related BOLD signals averaged across subjects for the far and nearby trials in the attended session. B, Suppression indices in the attended session. 


*p < 0.05, suppression indexis significantly above zero. **p < 0.01, suppression index is significantly above zero. ***p < 0.001, suppression indexis significantly above zero. C, Suppression indices 
in the unattended session. D, Suppression indices obtained from half of the fMRI data in the attended session exhibited a strong behavioral crowding effect (nearby vs far), and the other half 
exhibited a weak effect. **p < 0.01, statistically significant difference between the two indices in V1. Error bars indicate 1 SEM across subjects. 


larger the index, the stronger the suppression. We found that, in 
the attended session, the suppression indices were significantly 
larger than zero in V1 (ti) = 6.58, p < 0.001), V2 (to) = 4.58, p < 
0.01), and V4 (to) = 3.58, p < 0.05) (Fig. 5B). V1 had the largest 
index, which was significantly larger than those in LO (t;o) = 3.80, 
p < 0.05) and IPS (to) = 3.70, p < 0.05). In the unattended 
session, no area showed a significantly positive index (all t;o) < 
1.23, p > 0.252) (Fig. 5C). For all the ROIs, we also performed a 
repeated-measures ANOVA of the peak amplitudes with atten- 
tion status (attended and unattended) and distance (far and 
nearby) as within-subject factors. Only V1 exhibited a significant 
interaction between attention status and distance (Fq 5) = 20.02, 
p < 0.01), which is generally in line with the t test results. 
Because the cortical suppression in V1, V2, and V4 could be 
modulated by the target—flanker distance, similar to the ERP ex- 
periments, we further investigated the association between the 
suppression in these areas and the perceived crowding. We first 
ranked the strength of crowding (i.e., the response accuracy dif- 
ference between the nearby and far conditions) in eight fMRI 
runs for each subject. Then, these eight runs were split into two 
groups: four runs with the largest differences in the strong crowding 
group and the remaining runs in the weak crowding group. For the 
strong and weak crowding groups, the mean accuracy differences 
were 17.13 + 2.66% and 3.68 + 2.50%, respectively. Suppression 
indices were calculated for both groups in V1, V2, and V4. Only in 
V1, the suppression index for the strong crowding group was signif- 
icantly larger than that for the weak crowding group (to) = 4.861, 


p < 0.01) (Fig. 5D). For the three ROIs, we also performed a 
repeated-measures ANOVA of the peak amplitudes with crowding 
strength (strong and weak) and distance (far and nearby) as within- 
subject factors. Only V1 exhibited a significant interaction between 
crowding strength and distance (Fq o) = 7.03, p < 0.05). These re- 
sults demonstrate that the cortical suppression in V1 was closely 
associated with the magnitude of the crowding effect. 


Experiment 4: cortical suppression and the 

radial-tangential anisotropy 

The design of Experiment 4 was identical to Experiment 3, except 
with different stimuli (Rad_T+F, Tan_T+F, Rad_F, and 
Tan_F). Experiment 4 had two trial types (conditions): radial and 
tangential trials. In a radial trial, the Rad_T+F and Tan_F stimuli 
were presented. In a tangential trial, the Tan_T+F and Rad_F 
stimuli were presented (Fig. 6A). Integrated over time, the phys- 
ical stimulations in each location of the target and flankers were 
identical in the two conditions. Consistent with the radial—tan- 
gential anisotropy prediction, subjects’ performance with the ori- 
entation discrimination task to the target was better in the 
tangential condition (72.6%) than in the radial condition 
(62.9%) (fo) = 4.17, p < 0.01; Fig. 6B). 

ROIs were cortical areas representing the locations of the target 
and flankers in V1-V4, LO, and IPS (Fig. 6C). A suppression index 
was defined as (BOLD,,,,,,-BOLD,,4)/(BOLD,,,,, + BOLD, 4), where 
BOLD,,,, and BOLD, 4 are the peak amplitudes of BOLD signals in 
the tangential and radial conditions, respectively (Fig. 7A). If the 
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mutual suppression between the target and 
flankers in the radial condition is stronger 
than that in the tangential condition, 
BOLD,,,, should be larger than BOLD,,4- 
Thus, the suppression index should be 
above zero; the larger the index, the stronger 
the suppression. We found that, in the at- 
tended session, V1 had the largest index and 
only the index in V1 was significantly larger 
than zero (V1: to) = 4.58, p < 0.01; Fig. 7B). 
In the unattended session, no area showed a 
significantly positive index (all to) < 1.22, 
p > 0.25) (Fig. 7C). For all the ROIs, we also 
performed a repeated-measures ANOVA of 
the peak amplitudes with attention status 
(attended and unattended) and orientation 
(radial and tangential) as within-subject fac- 
tors. V1 and V2 exhibited a significant 
interaction between attention status and 
orientation (both Fq) > 8.46, p < 0.05), 
which is generally in line with the ¢ test 
results. 

Similar to Experiment 3, we ranked the 
strength of the radial-tangential anisot- 
ropy (i.e., the response accuracy differ- 
ence between the radial and tangential 
conditions) in eight fMRI runs for each 
subject, then split these eight runs into the 
strong anisotropy group and the weak an- 
isotropy group, with four runs in each 
group. For the strong and weak anisotropy groups, the mean accu- 
racy differences were 17.13 + 1.95% and 2.24 + 2.75%, respectively. 
Suppression indices were calculated for both groups in V1. The sup- 
pression index for the strong anisotropy group was significantly 
larger than that for the weak ansiotropy group (to) = 3.06, p < 0.05) 
(Fig. 7D). We also performed a repeated-measures ANOVA of the 
peak amplitudes with crowding strength (strong and weak) and ori- 
entation (radial and tangential) as within-subject factors. V1 exhib- 
ited a significant interaction between crowding strength and 
orientation (Fao) = 16.22, p < 0.01). These results demonstrate a 
tight coupling between the cortical suppression in V1 and the radial- 
tangential anisotropy of crowding 
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Discussion 

With a combination of ERP and fMRI approaches, we demon- 
strated that the orientation crowding effect was closely associated 
with the inhibitory interaction between the target and flankers, as 
manifested in the suppression of the C1 component and the V1 
BOLD signal. Furthermore, the suppression was largely depen- 
dent on spatial attention. These results strongly suggest that 
attention-dependent V1 suppression contributes to crowding at 
a very early stage of visual processing. 

Our findings are of unique significance to understanding the 
neural mechanisms of crowding. First, we provide the first piece 
of neurophysiological evidence regarding the temporal evolution 
of crowding, which goes significantly beyond previous fMRI 
studies (Fang and He, 2008; Bi et al., 2009; Freeman et al., 2011; 
Anderson et al., 2012; Millin et al., 2013). The very short peak 
latency (77-78 ms) of the C1 component unequivocally supports 
that crowding originates in early visual cortex, as early as V1. 
Second, we not only show that the early cortical suppression is 
associated with the target—flanker distance and the radial—tan- 
gential anisotropy but also demonstrate a close link between the 
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Design and psychophysical result for Experiment 4. A, Design. The target was presented either with tangential 
flankers (top) or with radial flankers (bottom). Subjects performed an orientation discrimination task with the target in the 
attended session or detected a luminance change of the fixation point in the unattended session. B, Psychophysical result for the 
tangential and radial trials. €, A sample image for ROI localization. **p < 0.01, statistically significant difference between the 
tangential and radial trials. Error bars indicate 1 SEM across subjects. 


suppression and the perceived crowding. Third, our evidence is 
strong and converging. The fMRI observation that V1 is the only 
area with the suppression tightly tied to the strength of the per- 
ceived crowding supports that the crowding-related BOLD signal 
in V1 is unlikely feedback from higher cortical areas, consistent 
with the ERP findings. 

In a very recent fMRI study, Millin et al. (2013) manipulated 
the target—flanker distance to modulate the strength of crowding. 
They found that crowding induced BOLD signal suppression in 
V1, even when subjects were performing a fixation task and did not 
pay attention to the stimuli. However, we failed to find such suppres- 
sion in the unattended session of Experiment 3. Our and their ex- 
periments are different in many aspects, including stimulus, 
experimental design, and data analysis. Their stimuli were presented 
closer to fixation and longer than ours, which could induce stronger 
BOLD signals. The block design used by them is more effective to 
detect BOLD signal changes than the event-related design we used 
here. Taking into account all these evidence, we suggest that the 
crowding-induced cortical suppression could be modulated by at- 
tention, rather than completely depends on attention. 

What is the nature of the cortical suppression? One possibility 
is that the suppression occurs when both the flankers and target 
fall into a large receptive field of a neuron. However, if this is the 
case, we should have observed consistent and reliable suppression 
in V4, LO, and IPS (rather than V1) because the receptive fields of 
neurons in these areas are large enough to accommodate the 
target and flankers (Smith et al., 2001; Dumoulin and Wandell, 
2008). A second possibility is that the suppressive interaction 
occurs through long-range horizontal connections between dif- 
ferent populations of neurons that respond to the flankers and 
target. Stettler et al. (2002) showed that the horizontal connec- 
tions cover portions of V1 representing regions of visual space up 
to eight times larger than classical receptive fields. Given that the 
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Figure 7. 


fMRI results for Experiment 4. A, Event-related BOLD signals averaged across subjects for the tangential and radial trials in the attended session. B, Suppression indices in the attended 


session. **p < 0.01, suppression index is significantly above zero. C, Suppression indices in the unattended session. D, Suppression indices obtained from half of the fMRI data in the attended session 
exhibited a strong behavioral crowding effect (radial vs tangential), and the other half exhibited a weak effect. *p < 0.05, statistically significant difference between the two indices in V1. Error bars 


indicate 1 SEM across subjects. 


receptive field of neurons in V1 at 8° eccentricity is ~0.7° (Smith 
et al., 2001), the spatial extent influenced by the horizontal con- 
nections from these neurons is up to 5.6° (0.7° X 8), which is large 
enough to cover two gratings (2 X 2.36°) in the current study. 
The possible involvement of the horizontal connections suggests 
that crowding is unavoidable because of the intrinsic structure of 
the peripheral visual system. In some cases, the long-range hori- 
zontal connections could help detect contours (Li et al., 2006), 
but in other cases, they might exert deleterious influence on fea- 
ture extraction and consequently lead to crowding. 

Unlike a recent study that proposed the preattentive early 
cortical interaction explanation for crowding (Millin et al., 2013), 
our study emphasizes the importance of spatial attention in 
crowding. This is in line with a psychophysical study demonstrat- 
ing that collinear facilitation by flankers was significant only 
when the flankers were attended to (Freeman et al., 2001). Atten- 
tional modulation in visual cortex is usually thought to be imple- 
mented through feedback from the frontoparietal attentional 
network and manifested in the intermediate and late stages of 
neuronal responses (Corbetta and Shulman, 2002). Although 
fMRI studies have shown that spatial attention can modulate 
BOLD signals in V1(Gandhi et al., 1999; Liu et al., 2005), a large 
body of neurophysiological evidence supports the view that the 
earliest signals in V1 and the C1 component are not affected by 
spatial attention (Clark and Hillyard, 1996; Martinez et al., 1999; 
Ding et al., 2013; but see also Kelly et al., 2008). Consistent with 
the neurophysiological findings, we also failed to find attentional 
modulation of the amplitude and latency of the C1 components 


evoked by the target and flankers themselves. However, our ob- 
servation that attention can modulate the suppressive interaction 
between the target and flankers as early as 77 ms after stimulus 
onset is intriguing. Feedback mechanisms time-locked to stimu- 
lus onset cannot readily explain this finding. Gilbert et al. (2000) 
and Li et al. (2004) showed that attention can modulate contex- 
tual influences through the horizontal connections in V1 and that 
the modulation started from the very beginning (~70 ms) of the 
time course of V1 neuronal responses. These findings are in line 
with our results. In our study, subjects were instructed to pay 
attention to the stimuli throughout the whole attended session. 
We speculate that sustained spatial attention might alter the func- 
tional status of the horizontal connections in the session, leading 
to cortical suppression at a very early processing stage. 

Our fMRI experiments showed that only the cortical suppres- 
sion in V1 (but not other ROIs) was associated with both the 
target—flanker distance and the radial-tangential anisotropy. 
This is consistent with a recent computational model proposed 
by Nandy and Tjan (2012), which shows that crowding is caused 
by saccade-confounded image statistics encoded in lateral con- 
nections between V1 hypercolumns. The model can explain most 
of the important characteristics of crowding, including the Bou- 
ma’s law, the inward-outward asymmetry of the crowding zone, 
and the radial-tangential anisotropy. However, we should not 
preclude the possibility that crowding occurs at multiple levels in 
the visual system. For example, Louie et al. (2007) demonstrated 
a holistic crowding between high-level face representations, sug- 
gesting that face-selective areas (e.g., fusiform face area) might 
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play a role in this kind of crowding (Louie et al., 2007). In the 
future, it would be important to investigate whether our conclu- 
sion can be generalized to other conditions and stimuli. 
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